Troy Mazerolle
Student Number 8972394
# Utility Libraries
import numpy as np
import pandas as pd
# Graphing Libraries
import plotly
import plotly.graph_objects as go
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sb
plotly.offline.init_notebook_mode()
For my first graph I want to demonstrate how to use plotly to plot the historical prices of a stock using a candlestick chart. I chose to do this because one of my goals for this program is to learn how to apply machine learning methods to financial markets.
For this demonstration, I will be using the historical price data of Unity, which is a software used for game development. I chose Unity simply because I am currently learning how to use Unity.
unity = pd.read_csv("U.csv")
fig = go.Figure(data = [go.Candlestick(x = unity['Date'],
open = unity['Open'],
high = unity['High'],
low = unity['Low'],
close = unity['Close'])])
fig.update_layout(
title = "Unity Stock Prices",
yaxis_title = "Price",
xaxis_title = "Date",
xaxis_rangeslider_visible = False
)
fig.show()
A candlestick can be read as follows:
The Relative Strength Index (RSI) is an indicator that measures how overbought or oversold a stock is. The formula for RSI is $RSI = 100 - [\frac{100}{1 + \frac{n_{up}}{n_{down}}}]$, where $n_{up}$ and $n_{down}$ are the average gains and average losses respectively over the period. In this case we are using a period of 10, which means that the RSI at date X is calculated using the prices between 10 days before X and X. A general rule-of-thumb is that when the RSI is above 70, the stock is considered overbought, and it is a signal that the stock price will start going down. Conversely, when the RSI is below 30, the stock is considered oversold, and it is a signla that the stock price will start increasing.
First, we need to write a function that takes in the data of the stock and the reference period, and outputs the RSI values:
def rsi(stockData, period):
open = stockData['Open']
close = stockData['Close']
returns = (close - open)/open
intialReturns = returns[0:period]
avgGain = sum(intialReturns[intialReturns >= 0])
avgLoss = abs(sum(intialReturns[intialReturns < 0]))
rsIndexes = np.array([])
for i in range(period, len(returns) - 1):
intialReturns = returns[(i - period):i]
avgGain = sum(intialReturns[intialReturns >= 0])
avgLoss = abs(sum(intialReturns[intialReturns < 0]))
rsIndexes = np.append(rsIndexes, 100 - (100 / (1 + avgGain / avgLoss)))
return rsIndexes
rsiValues = rsi(unity, 10)
Using matplotlib, we can now plot the relative strength index of Unity stock. We will also add horizontal lines at $RSI = 30$ and $RSI = 70$ to indicate when the stock might be oversold or overbought.
dates = unity['Date'][(len(unity['Date']) - len(rsiValues)):len(unity['Date'])] # Getting the dates that correspond with the values in RSI
fig, ax = plt.subplots()
ax.plot(dates, rsiValues)
plt.axhline(y = 70, color = 'g', linestyle = '--', linewidth = 2)
plt.axhline(y = 30, color = 'r', linestyle = '--', linewidth = 2)
plt.title("Relative Strength Index of Unity")
plt.xlabel('Date')
plt.ylabel('RSI')
Text(0, 0.5, 'RSI')
The dates are difficult to read, but we can see that the stock was oversold towards the middle. The stock is also currently overbought, which suggests that the price might go down.
For my next chart, I want to compare the total gains and total losses between Microsoft and AMD. I chose these two stocks because Microsoft is considered to be a low-risk stock, while AMD is considered to be a high-risk stock. Using a bar chart, I want to visually show the differences between profits and losses.
In order to plot this efficiently, we need to organize the data so that the barplot function can read everything easily. To do this, we will:
microsoft = pd.read_csv("MSFT.csv")
amd = pd.read_csv("AMD.csv")
microsoftReturns = (microsoft['Close'] - microsoft['Open']) / microsoft['Open']
amdReturns = (amd['Close'] - amd['Open']) / amd['Open']
microsoftDF = pd.DataFrame(data = {'Symbol': ['MSFT'] * len(microsoftReturns),
'Return': microsoftReturns})
amdDF = pd.DataFrame(data = {'Symbol': ['AMD'] * len(amdReturns),
'Return': amdReturns})
stockData = pd.concat([microsoftDF, amdDF], axis = 0)
stockData = stockData.reset_index(drop = True)
returnDirection = [''] * len(stockData)
for i in range(len(stockData)):
returnDirection[i] = "POSITIVE" if stockData['Return'][i] >= 0 else "NEGATIVE"
stockData['Direction'] = returnDirection
stockData['Return'] = abs(stockData['Return'])
print(stockData)
Symbol Return Direction 0 MSFT 0.001965 POSITIVE 1 MSFT 0.008455 POSITIVE 2 MSFT 0.001570 POSITIVE 3 MSFT 0.021779 NEGATIVE 4 MSFT 0.013074 POSITIVE .. ... ... ... 497 AMD 0.018636 NEGATIVE 498 AMD 0.010653 POSITIVE 499 AMD 0.024834 POSITIVE 500 AMD 0.016601 NEGATIVE 501 AMD 0.043179 NEGATIVE [502 rows x 3 columns]
Now that we have all the columns we need, we move on to graphing the returns.
sb.barplot(data = stockData, x = "Symbol", y = "Return", hue = "Direction", estimator = sum, errorbar = None)
<Axes: xlabel='Symbol', ylabel='Return'>
From the bar graph, we can see that both MSFT and AMD are generally profitable. While AMD has significantly higher total profits, it also has significantly higher total losses. This is why a stock like Microsoft is generally considered low-risk, while a stock like AMD is generally considered high-risk. While both stocks appear to have about the same net-profit (AMD appears to be slightly higher), there is less volatility in Microsoft.
This last plot will not be a finance plot. However, I did want to practice plotting various functions and lines-of-best-fit, especially since we will be studying linear regression shortly.
We will start by generating some data. We will make a line with equation $y = 3x + 4$, and generate some error in the values with $mean = 0$ and $\sigma = 50$. We will then plot the scatterplot and overlay the actual line over the plot.
func = lambda x: 3 * x + 4
xvals = np.arange(0, 101, 1)
mean = 0
std = 50
yvals = func(xvals) + np.random.normal(mean, std, 101)
fig, lineplot = plt.subplots()
lineplot.plot(xvals, yvals, '.')
lineplot.plot(xvals, func(xvals))
[<matplotlib.lines.Line2D at 0x21af3af56d0>]
The above structure can be used to display linear regression models. However in regression, instead of generating data based on a predetermined slope and y-intercept, we would be given the data and have to mathematically solve for the optimal slope and y-intercept.